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MEASURES WITH ZEROS IN THE INVERSE OF THEIR 
MOMENT MATRIX 
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Mathematics, Toulouse and University of California, Santa Barbara 

We investigate and discuss when the inverse of a multivariate 
truncated moment matrix of a measure /i has zeros in some pre- 
scribed entries. We describe precisely which pattern of these zeroes 
corresponds to independence, namely, the measure having a prod- 
uct structure. A more refined finding is that the key factor forcing a 
zero entry in this inverse matrix is a certain conditional triangularity 
property of the orthogonal polynomials associated with ^. 

1. Introduction. It is well known that zeros in off-diagonal entries of 
tiie inverse of a n x n covariance matrix M identify pairs of random 

variables that have no partial correlation (and so are conditionally indepen- 
dent in case of normally distributed vectors); see, for example, Wittaker [7], 
Corollary 6.3.4. Allowing zeros in the off-diagonal entries of is par- 

ticularly useful for Bayesian estimation of regression models in statistics, 
and is called Bayesian covariance selection. Indeed, estimating a covariance 
matrix is a difficult problem for large number of variables, and exploiting 
sparsity in may yield efficient methods for Graphical Gaussian Models 
(GGM). For more details, the interested reader is referred to Cripps, Carter 
and Kohn [3] and the many references therein. 

The covariance matrix can be thought of as a matrix whose entries are 
second moments of a measure. This paper focuses on the truncated mo- 
ment matrices, M^, consisting of moments up to an order determined by 
d. First, we describe precisely the pattern of zeroes of M^^ resulting from 
the measure having a product type structure. Next, we turn to the study 
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of a particular entry of being zero. We find that the key is what we 
call the conditional triangularity property of orthonormal polynomials (OP) 
up to degree 2d, associated with the measure. To give the flavor of what 
this means, let, for instance, /i be the joint distribution /i of n random 
variables X = {Xi, . . . and let {pa} C E[X] be its associated family of 

orthonormal polynomials. When (Xfc)fc^jj- is fixed, they can be viewed as 
polynomials in M[Xj,Xj]. If in doing so they exhibit a triangular structure 
[whence, the name conditional triangularity w.r.t. {Xk)k-^ij], then entries 
of M^^ at precise locations vanish. Conversely, if these precise entries of 
M^^ vanish (robustly to perturbation), then the conditional triangularity 
w.r.t. {Xkjk^ij holds. And so, for the covariance matrix case d = 1, this 
conditional triangularity property is equivalent to the zero partial correla- 
tion property well studied in statistics (whereas in general, we shall show 
that conditional independence is not detected by zeros in the inverse of the 
covariance matrix). Inverses of moment matrices naturally appear also in 
the recent work [2]. 

Interestingly, in a different direction, one may relate this issue with a 
constrained matrix completion problem. Namely, given that the entries of 
Md corresponding to marginals of the linear functional w.r.t. one variable 
at a time are fixed, complete the missing entries with values that make 
positive definite. This is a constrained matrix completion problem as one 
has to respect the moment matrix structure when filling up the missing 
entries. Usually, for the classical matrix completion problem with no con- 
straint on M, the solution which maximizes an appropriate entropy gives 
zeros to entries of corresponding to missing entries of M. But un- 

der the additional constraint of respecting the moment matrix structure, 
the maximum entropy solution does not always fill in Af^^ with zeros at 
the corresponding entries (as seen in examples by the authors). Therefore, 
any solution of this constrained matrix completion problem does not always 
maximize the entropy. Its "physical" or probabilistic interpretation is still 
to be understood. 

We point out another accomplishment of this paper. More generally than 
working with a measure is working with a linear functional (. on the space 
of polynomials. One can consider moments with respect to I and moment 
matrices. Our results hold at this level of generality. 

2. Notation and definitions. For a real symmetric matrix A E M"^", the 
notation ^ ;^ (resp. A ^ 0) stands for A positive definite (resp. semidefi- 
nite), and for a matrix B, let B' or B^ denote its transpose. 

2.1. Monomials, polynomials an moments. We now discuss monomials 
at some length, since they are used in many ways, even to index the moment 



ZEROS IN THE INVERSE OF THE MOMENT MATRIX 



3 



matrices which are the subject of this paper. Let N denote the nonnegative 
integers and N" denote n tuples of them and for a = (ai,a2 • • - an) £ N", 
define |a| := J2i^i- The set sits in one-to-one correspondence with the 
monomials via 

a G N" ~ X° := X^'X^^ ■ ■ ■ ■ 

Recall also the standard notation 

degX'^ = \a\ = ai + ■ ■ ■ + an- 

By abuse of notation, we will freely interchange below X" with a, for in- 
stance, speaking about dega rather than degX", and so on. 

Let M[X] denote the ring of real polynomials in the variables Xi, . . . ,Xn 
and let M^fX] C M[X] be the M-vector space of polynomials of degree at 
most d. A polynomial p £ M[X] is a finite linear combination of monomials 
and it can be written 

(2.1) p{X)=Y.p^X". 

Let y = {ya)a£N" and define the linear functional Ly :R[X] — > M first on 
monomials by Ly{X°') = and then by linear extension to polynomials. 
That is, 

(2-2) p^ Ly{p):= ^ paya, 

whenever {pa) are the coefficients of a polynomial as in (2.1). A linear func- 
tional Ly on polynomials which is nonnegative (resp. positive) on all squares 
of polynomials [i.e., Ly{p'^) is nonnegative] is what we call a nonnegative 
{resp. positive) functional. 

The most prominent example is when the are moments of a measure 
^ on M", that is, 

ya= [ x^'nidx) VaGN", 

assuming of course that the signed measure /i decays fast enough at infinity, 
so that all monomials are integrable with respect to its total variation. Then 

Ly{p) = / p{x)dfi{x) 

JR" 

for every p G M[X]. Call /.t a representing measure for y. For a positive mea- 
sure fi, the functional Ly is nonnegative, however, in the converse direction, 
a linear functional Ly on polynomials being nonnegative on squares of poly- 
nomials is not equivalent to there existing a positive measure /i. This is 
closely related to Hilbert's 17th Problem and its progeny, focusing on posi- 
tive polynomials not always being a sum of squares. 
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As the reader will see much of what we do here holds for positive linear 
functionals, no measure is required. To state our results, we must introduce 
finite moment matrices. Their entries are indexed by monomials and so we 
must describe orders on monomials. 

2.2. Orders on monomials. Define the fully graded partial order (FG 
order) "<", on monomials, or equivalently, at the level of multi-indices, 
define 7 < a for 7, a G N" iff 7^ < aj for all j = 1, . . . , 5. Important to us is 

a<P iff X" divides X'^. 

Define the graded lexicographic order (GLex) "<gi", on monomials, or 
equivalently, for 7, a € first by using deg7 < dega to create a partial 
order. Next refine this to a total order by breaking ties in two monomials 
^ ^7712 Qf ^j^g same degree | mi | = | m2 \ , as would a dictionary with Xi = 
a,X2 = 6, For example, the monomials in two variables Xi,X2 of degree 

< 2 listed in GLex order are 

l,Xi,X2, Xf , X1X2 , X| . 

Beware 7 <gi a does not imply 7 < a; for example, (1, 1,3) <gi (1, 3, 1), but 

< fails. However, P <a and (3 ^ a imply /3 <gi a. 

It is convenient to list all monomials as an infinite vector Vqo{X) := 
{X")a^fq7i, where the entries are listed in GLex order, henceforth called the 
tautological vector; Vd{X) = {X'^)^a\<d SM*^'') denotes the finite vector con- 
sisting of the part of Voq{X) containing exactly the degree < d monomials. 
Using this notation, we can write polynomials as 

(2.3) p{X) = {p,va{X)) 

for some real vector p = (po), where the latter is the standard nondegenerate 
pairing between R^"^^ and M^^'^) ®r R[X]. 

2.3. Moment matrix. Given a sequence y = (ya)aeN") the moment ma- 
trix Mfi[y) associated with y has its rows and columns indexed by a, |q| < d, 
where the a are listed in GLex order, and 

Mrf(y)(a,/3) := LyiX'^ X^) = y^+p Va,/3GN" with \a\,\(3\<d. 

For example, M2{y) is 



M2{y) : 





1 


Xl 


X2 


xl 


X1X2 


xi 


1 


1 


yio 


yoi 


y2o 


yu 


2/02 




yio 


y2o 


yu 


yso 


2/21 


2/12 


X2 ^ 


yoi 


yu 


yo2 


y2i 


2/12 


2/03 


X! - 


y2o 


yso 


2/21 


2/40 


2/31 


2/22 


X1X2 


yu 


y2i 


yi2 


2/31 


2/22 


2/13 


xl 


yo2 


yi2 


yo3 


2/22 


2/13 


2/04- 
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Note that the functional Ly produces for every d > a positive semidefinite 
moment matrix Md{y) if and only if 

that is, iff Ly is a nonnegative functional. The associated matrices M(i{y) 
are positive definite, for all (i > if and only if Ly{p'^) = implies p = 0. 

3. Measures of product form. Now we can describe one pursuit of this 
paper. Given a sequence y = (ya) indexed by a,\a\ < 2d, we investigate some 
properties of the inverse Mii{y)~^ of a positive definite moment matrix Md{y) 
when entries of the latter satisfy a product form property. 

Definition 1. We say that the moment matrix Md{y) >- has the prod- 
uct form property, if 

n 

(3.1) Ly{X'') = \{Ly{X^^) VaGN",|a| <2d, 

i=l 

or equivalently, we say the positive linear functional Ly has the indepen- 
dence property. If Md{y)~'^{a,()) = for every y such that M(i{y) >- has 
the product form property, then we say the pair (a, f3) is a congenital zero 
for d-moments. 

For example, if y consists of moments of a product measure /i = 
with / /ij = 1, then (3.1) corresponds to the fact 

L,(x")= / XV =n/ 

For random variables, congenital zeroes can be thought of as those zeroes 
in Md{y)~^ due to independence. 

We now can give the flavor of our main results. 

Theorem 3.1. The pair {a, 13) G N" x N" is a congenital zero for the 
d-moment problem if and only if the least common multiple of X'^ and 
X^ has degree bigger than d. 

The above result can be conveniently rephrased in terms of the max op- 
eration defined for a,/? G N" by 

max(a,/3) := (max(aj, /3j))j=i^...^„. 

Set 77 := max(a,/3). Simple observations about this operation are as follows: 

1. X^' is the least common multiple of X" and X^, 
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2. X" divides iff = X^, 
divides X" iff X" = X*?, 
3- 1^1 =Ej=imax(aj,^j). 

Tlius, Theorem 3.1 asserts that the entry does not correspond to a 

congenital zero in the matrix M(i{y)~^ if and only if | max(a,/5)| < d. 

Later in Theorem 5.1 we show that this LCM (least common multiple) 
characterization of zeros in M^^{y) is equivalent to a highly triangular struc- 
ture of orthonormal polynomials associated with the positive functional Ly. 

Example. In the case of M2^{y) in two variables Xi,X2 we indicate 
below which entries M2{y)~^{a, with |/?| = 2, are congenital zeroes. These 
(a,/3) index the last three columns of M2{y)~^ and are 

Xf Xi X2 X2 



1 — > * * * 

Xi — > * * 

X2 — > * * 

X^ ^ * 

X1X2 ^0*0 

X| ^ *. 



Here * means that the corresponding entry can be different from zero. Note 
each * corresponds to X° failing to divide X^. 

The proof relies on properties of orthogonal polynomials, so we begin by 
explaining in some detail the framework. 

3.1. Orthonormal polynomials. A functional analytic viewpoint to poly- 
nomials is expeditious, so we begin with that. Let s{d) := be the 
dimension of vector space Rd[X]. Let {■,-)^s{d) denote the standard inner 
product on R^^'^). Let f,h£ R[X] be the polynomials /(X) = Efifio/"^" 
and h{X) = E^lo^"^"- Then, 

{f,h)y := (f,A'/rf(y)hV(.) =L,(/(X)/i(X)) 

defines a scalar product in Mrf[X], provided M^iy) is positive definite. 

With a given y = {ya) such that Md{y) >- 0, one may associate a unique 
family {PaY^j^^^Q of orthonormal polynomials. That is, the PaS satisfy 

{Pa G lin.spanjX^;/? <gi a}, 
{Pa,Pf3)y = Sal3, \a\,\P\<d, 
{pa,XP)y = 0, if /3<gia,(p„,X")j,>0. 
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Note {pa, X^)y = 0, if P < a and (3, since the latter implies f3 <gi a. 

Existence and uniqueness of such a family is guaranteed by the Gram- 
Schmidt orthonormalization process following the GLex order on the mono- 
mials, and by the positivity of the moment (covariance) matrix; see, for 
instance, [1], Theorem 3.1.11, page 68. 

Computation. Although not needed for the rest of the present article, a 
determinantal formula for the orthogonal polynomials is within reach, with 
a proof very similar to the classical one in the one variable case. The reader 
can omit this subsection without loss of continuity. 

Suppose that we want to compute the orthonormal polynomials p^ for 
some index a. Then proceed as follows: build up the sub-moment ma- 
trix M^"\y) with columns indexed by all monomials [3 <gi cr, and rows 
indexed by all monomials a <g\ a. Hence, M^"\y) has one row less than 
columns. Next, complete M^"\y) with an additional last row described by 
[M(°") = X^, for all /3 <gi a. Then up to a normalizing constant, Pa is 

nothing less than det(M°"(y)). 

To see this, let 7 <gi a. Then 

{X\p„)y = Ly{X^< det(M(-)(y))) = det(i?-)(y), 

where the matrix B'^{y) is the same as M'^{y) except for the last row which 
is now the vector {Ly{X^~^°'))a<^ia already present in one of the rows above. 
Therefore, det{B'^){y) = 0. For instance, with n = 2 and the ordering Xi < 
X2, let a := (0, 1). Then {Xi,Pf^)y = because 





1 yioyoi" 




1 yio 


yoi 


^Xi det 


yio y2o vii 


^ = ^det 




yii 




_ 1 Xl X2_ 




Xl xl 


X1X2 



det 



1 yioyoi 
yio y2o yn 
yw y2o yu 



0, 



and similarly, {l,pa-)y = 0. 

Next, writing its coefficient P(j/3 is just (again up to a normalizing con- 
stant) the cofactor of the element [M'^(y)]i^^ in the square matrix M°"(y) 
with rows and columns both indexed with a <gi a. 



Further properties. Now we give further properties of the orthonormal 
polynomials. Consider first one variable polynomials. The orthogonal poly- 
nomials pk have, by their very definition, a "triangular" form, namely 

Pk{Xi) ■.= J2pkeXi. 

£<k 
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The orthonormal polynomials inherit the product form property of Md(y), 
assuming that the latter holds. Namely, each orthonormal polynomial pa is 
a product 

(3.3) Pc{X) =Paj{Xi)pa2{X2)---PaA^n) 

of orthogonal polynomials Paj (Xj) in one dimension. Indeed, by the product 
property, 

n 

i=i 

whence the product of single variable orthogonal polynomials satisfies all 
requirements listed in (3.2). 

"Triangularity" in one variable and the product form property (3.3) forces 
Pa to have what we call a fully triangular form: 

(3.4) pa{X):=J2payX\ \a\ < d. 

Also note that for any 7 < a there exists a positive functional Ly of prod- 
uct type making p^^ not zero. 

To exhibit such a functional, we will use a particular property of coeffi- 
cients of Laguerre polynomials. Given a G N", consider the product measure 
fj,{dx) = nr=i l^i{dxi) on the positive orthant , with ^i{dxi) = e~^*x°^' dxi. 

The univariate Laguerre polynomials 

are orthogonal with respect to the measure fii on the semi-axis M_|_; see, for 
instance, [4]. Observe that the degree of the coefficient of xj with respect to 
the variable is precisely k — j. 

The orthogonal polynomials associated with the product measure /z and 
its associated positive functional 

Ly{p)= p{xi,...,Xn)e-''^ ''"x^^---x^"dxi---dxn, peR[X], 

are the (Laguerre^) polynomials A^(X) = nj=i -^^^^^^(-'^jO- 
We formalize a simple observation as a lemma because we use it later. 

Lemma 3.2. The coefficients Pa,i3{o') in the decomposition 

AUX) = Y.PaA^)X^ 

/3<a 

of LaguerrCa polynomials are themselves polynomials in a = (fii, . . . , cj„), 
viewed as independent variables, and the multi-degree of Pcx,p{(^) is a — (3. 
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Note that for an appropriate choice of the parameters crj,! < j < n, the 
coefficients PoL,p{<^) in the decomposition 

/3<a 

are hnearly independent over the rational field, and hence nonnull. To prove 
this, evaluate a on an n-tuple of algebraically independent transcendental 
real numbers over the rational field. 

Note that we have used very little of the specific properties of the Laguerre 
polynomials. The mere precise polynomial dependence of their coefficients 
with respect to the variable a was sufficient for the proof of the above lemma. 

3.2. Proof of Theorem 3.1. Let Ly be a linear functional for which 
Md{y) >- and let (pa) denote the family of orthogonal polynomials with 
respect to Ly. Orthogonality in (3.2) for expansions (3.4) reads 

7<a,o'</3 

In matrix notation this is just 

I = DMd{y)D^, 

where D is the matrix D = ipa'y)\a\,\'y\<d- Its columns are indexed (as before) 
by monomials arranged in GLex order, and likewise for its rows. That pa-y = 
if 7 ^ a implies that pa'y = if a <gi 7, which says precisely that D is 
lower triangular. Moreover, its diagonal entries are not 0, since must 
have as its highest order term. Because of this and triangularity, D is 
invertible. Write 

Ma{y) = D-\D'^r' and Ma{yr' = D^D. 

Our goal is to determine which entries of M(i{y)~^ are forced to be zero and 
we proceed by writing the formula Z := Md{y)^^ = D^D as 

Zaf3 = X! PloiP-113 = X! P-iaP-if3 

(3.5) 

= X! PiaP-iP- 

max(a,/3)<7,|7| <(i 

We emphasize (since it arises later) that this uses only the full triangularity 
of the orthogonal polynomials rather than that they are products of one 
variable polynomials. If full triangularity were replaced by triangularity w.r. 
to <gi, then the first two equalities in (3.5) would be the same except that 
/? < 7, a < 7, 17I <d would be replaced by {3 <gi 7, a <gi 7, 17I < d. 



10 



J. W. HELTON, J. B. LASSERRE AND M. PUTINAR 



To continue with our proof, consider (a,/3) and set rj := max(a,/3). If 
|max(a,/3)| > d, then Za,i3 = 0, since the sum in equation (3.5) is empty. 
This is the forward side of Theorem 3.1. 

Conversely, consider the product measure on the positive orthant 
whose associated orthogonal polynomials are the LaguerrCo- polynomials 
{AJ^} of Lemma 3.2. When |max(a,/3)| < d the entry Za/s is a sum of one 
or more products p^aP'yp and so is a polynomial in a. If this polynomial is 
not identically zero, then some value of a makes Zaf3 7^ 0, so [a, (3) is not a 
congenital zero. Now we set out to show that z^p (as a polynomial in a) is 
not identically 0. 

Lemma 3.2 tells us each product P'y^aP'y,f3 is a polynomial whose multi- 
degree in a is exactly 27 — a — /3. The multi-index 7 is subject to the con- 
straints max(a,/3) < 7 and I7I < d. We fix an index, say, j = 1, and choose 

7 = max(a, (5) + {d—\ max(a, /3) | , 0, . . . , 0) . 

Note the product PyaP^p is included in the sum (3.5) for Za^/3 and it is a 
polynomial of degree 2d — 2| max(a, (3) \ + 2max{ai, f3i) — ai — /3i with respect 
to (Ti. By the extremality of our construction of 7, every other term P'^aP-fP in 
Zap will have smaller ai degree. Hence, p^^aPyp cannot be canceled, proving 
that Za/3 is not the zero polynomial. 

4. Partial independence. In this section we consider the case where only 
a partial independence property holds. We decompose the variables into 
disjoint sets X = {X{1), . . . , X{k)), where X{1) = {X{l)i, . . . , X{l)d,), and 
so on. Note that the lexicographic order on N" respects the grouping of 
variables, in the sense 

(ai,...,afc) <gi (/?!,..., /3fc) 

if and only if, either qi <gi or, if ai= Pi, then, either a2 <gi P2, and so 
on. 

The linear functional Ly is said to satisfy a partial independence property 
(w.r.t. the fixed grouping of variables), if 

k 

Ly{pi{X{l)) . ■ -pkiXik))) = n LyipjiXij)), 

where pj is a polynomial in the variables from the set X{j), respectively. 

In this context we still, analogously to Definition 1, use the term congen- 
ital zeros in connection with inverses of moment matrices M{y) >- corre- 
sponding to Ly having the partial independence property. Now we state the 
natural generalization of Theorem 3.1 to partial independence. 
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Theorem 4.1. Let Ly be a positive functional satisfying a partial inde- 
pendence property with respect to the groups of variables X = {X{1), . . . , X{k)) 
Let a = (ai, . . . ,ak), P = {Pi, ■ ■ ■ , f3k) be two multi-indices decomposed ac- 
cording to the fixed groups of variables, and satisfying \a\, \P\ < d. 

Then the {a, l3)-entry in the matrix Md{y)~^ is congenitally zero if and 
only if, for every 7 = (71, ... , 7^) satisfying 7^ >gi aj,Pj, I < j < k, we have 
h\>d. 

The structure behind this is just the analog of what we used before. 
Denote by deg Y(j) QiX) the degree of a polynomial Q in the variables X{j). 
Assuming that Ly is a positive functional, one can associate in a unique 
way the orthogonal polynomials pa, a € N*^. Let a = {ai, . . . , ak) be a multi- 
index decomposed with respect to the groupings ^(1), • . . ,X{k). Then, the 
uniqueness property of the orthonormal polynomials implies 

Pa{X)=Pa,{X{l))---p^,{X{k)), 

where pa.{X[j)) are orthonormal polynomials depending solely on X{j), 
and arranged in lexicographic order within this group of variables. 

With this background, the proof of Theorem 4.1 repeats that of Theorem 
3.1, with only the observation that 

Pa{X) = ^ Ca,-yX'^. 

i<i<fc 

7j<giaj 

5. Full triangularity. A second look at the proof of Theorem 3.1 reveals 
that the only property of the multivariate orthogonal polynomials we have 
used was the full triangularity form (3.4). In this section we provide an 
example of a nonproduct measure which has orthogonal polynomials in full 
triangular form, and, on the other hand, we prove that the zero pattern 
appearing in our main result, in the inverse moment matrices Mr(y)~^, r < d, 
implies the full triangular form of the associated orthogonal polynomials. 
Therefore, zeros in the inverse M^^ are coming from a certain triangularity 
property of orthogonal polynomials rather than from a product form of M^. 

Example 1. We work in two real variables {x,y), with the measure 
dfi = (1 — x'^ — y'^Y dx dy , restricted to the unit disk x^ + y^ < 1, where t > —1 
is a parameter. 

Let Pk{u;s) denote the orthonormalized Jacobi polynomials, that is, the 
univariate orthogonal polynomials on the segment [—1,1], with respect to 
the measure (1 — u'^Y du, with s > —1. 

According to [6], Example 1, Chapter X, the orthonormal polynomials 
associated to the measure dfx on the unit disk are 

Qm+n,n(x, y) = P„ (x, t + n + 1/2)(1 - x^^/^Pn (^^=L=; 
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and 

Qn,m+n{x, v) = Pmiv, t + n + 1/2) (1 - y2)"/2p„ (^ ^_^ ; . 

Observe that these polynomials have full triangular form, yet the generating 
measure is not a product of measures. 

Theorem 5.1 (Full triangularity theorem). Let y = {ya)a€N" ^ multi- 
sequence, such that the associated moment matrices Mfi{y) are positive def- 
inite, where d is a fixed positive integer. Then the following holds. 

For every r <d, the {a, [3)-entry in M^{y)~^ is whenever \ max(a,/3)| > 
r if and only if the associated orthogonal polynomials Pq, |a| < d, have full 
triangular form. 

Proof. The proof of the forward side is exactly the same as in the proof 
of Theorem 3.1. The converse proof begins by expanding each orthogonal 
polynomial 

(5.1) Pa{X):= J2 Pci^^^ l«l 

7<gl" 

where we emphasize that 7 <gi a is used as opposed to (3.4) and we want 
to prove that /3 <gi a and /3 > a imply p^p = 0. Note that when \a\ = d the 
inequality 

(5.2) /? > a is equivalent to | max(a,/3)| > d. 

The first step is to use the zero locations of Md{y)~^ to prove that 

(5.3) Pai3 = if I max(Q, /3)| > d, 

only for \a\ = d. Once this is established, then we apply the same argument 
to prove (5.2) for \a\ = d' where d' is smaller. 

If n = 1, there is nothing to prove. Assume n> 1 and decompose N" as N x 
pjn-i_ 'pj^g corresponding indices will be denoted by {i,a),i S N,q; G N"~^. 
We shall prove (5.3) by descending induction on the graded lexicographic 
order applied to the index (i, a) the following statement: 

Let i -\- \a\ = d and assume that (j,/5) <gi {i,a) and {j,(3) > {i,a). Then 

P{i,a),{j,f3) = 0. 

The precise statement we shall use is equivalent [because of (5.2)] to the 
following. 

The induction hypothesis: Suppose that 

(5.4) P(i' ,a'),(j,p) = if max(i',j) + | max(a',/?)| > d, 
holds for all indices {i' ,a') >gi {i,a), with i! + |a'| = i + |q| = d. 
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We want to prove P(i^a),{j,(3) = 0. Since we shall proceed by induction from 
the top, we assume that i + \a\ = d that {j,f3) > {i,a) that <gi {i,a) 

and let (i,a) denote the largest such w.r.t. <gi order. Clearly, i = d. There is 
only one corresponding term in the graded lexicographic sequence of indices 
of length less than or equal to d, namely, (d, 0). We shall prove that, for 
every /? G W^, {j,(3) <gi {d,0), > 0, we have 

P(d,o),{j,(3) = 0. 

Since the corresponding entry in M^{y)~^, denoted henceforth as before by 
is zero by assumption, and because 

^id,0),ij,/3) = Pid,0),id,0)P{d,0),ij,f3), 

we obtain P(^d,o),{j,l3) = 0- 

Now we turn to proving the main induction step. Assuming (5.4), we 
want to prove P(i^a),(j,i3) =0- Let (j, /?) <gi (i,a) subject to the condition 
max(i,j) + I max(a,/3)| > d. Then by hypothesis = Z(^i^a),{j,i3)^ so the GLex 
version of expansion (3.5) gives 

= P(i,a),{i,a)P{i,a),(j,l3) + ^ P{i' ,a'),(i,a)P{i' ,a'),{j,l3) 

i'=i,a'>gia,i' + \a'\=d 

P{i' ,a'),{i,a)P{i' ,a'),(j,|3)■ 
^'>^,^'+\a'\=d 

We will prove that the two summations above are zero. 

Indeed, if i' > i, then max{i,i') + |max(a',a)| > i + \a\ = d, and the in- 
duction hypothesis implies P[i'^a'),(i,a) = 0, which eliminates the second sum. 

Assume i' = i, so that \a\ = \a'\. Then if max(a,a') equals to either a or 
a' , we get a = a' , but this cannot be, since a' >gi a. Thus, i + \ max(Q;, a')\ > 
d and the induction hypothesis yields in this case P(i'^a'),{i,a) = 0- Which 
eliminates the first sum. 

In conclusion, 

P{i,a),{i,a)P{i,a),{j,f3) = 0' 

which implies 

Pii,a),ij,l3) = 0, 

as desired. 

Our induction hypothesis is valid and we initialized it successfully, so we 
obtain the working hypothesis: 
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whenever <gi (i,a), = d, and max((j, /3), (i, a)) > d. By (5.2), 

this is full triangularity under the assumption i + \a\ = d. □ 

In contrast to the above full triangularity criteria, the product decompo- 
sition of a potential truncated moment sequence {ya)ai<d can be decided by 
elementary linear algebra. Assume for simplicity that n = 2, and write the 
corresponding indices as a = G N^. Then there are numerical sequences 
{ui)i<d and {vj)j<d with the property 

y{ij)=UiVj, 0<i,j<d, 

if and only if 

rank(y(,j))J^.^o < 1. 

A similar rank condition, for a corresponding multilinear map, can be de- 
duced for arbitrary n. 

6. Conditional triangularity. The aim of this section is to extend the full 
triangularity theorem of Section 5, to a more general context. 

6.1. Conditional triangularity. In this new setting we consider two tuples 
of variables 

X = {xi,...,Xn), Y={yi,...,ym), 

and we will impose on the concatenated tuple {X,Y) the full triangularity 
only with respect to the set of variables Y . The conclusion is as expected: 
this assumption will reflect the appearance of some zeros in the inverse 
of the associated truncated moment matrix. The proof below is quite similar 
to that of Theorem 5.1 and we indicate only sufficiently many details to make 
clear the differences. 

We denote the set of indices by {a, (3), with a £ N",/? E N*", equipped 
with the graded lexicographic order "<gi". In addition, the set of indices 
P S N'", which refers to the set of variables Y , is also equipped with the full 
graded order "<". 

Let y = {ya)am^ be a multi-sequence, such that the associated moment 
matrices Md{y) are positive definite, where ti is a positive integer. As before, 
we denote 

P(„,^)(X,y)= P{a,/3),K,/3')^"'^^' 
(«,/3)>giK,/3') 

the associated orthogonal polynomials, and by 

^(a,/3),(a',/3') = H /'(7,T),(a,/3)P(7,<T),(a',/3') ' 

(7,'^)>gl(",/3),{"',/3') 

the entries in M(i{y)~^. 
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Definition 2 (Conditional triangularity). The orthonormal polynomi- 
als Pa^f3 G M[X, y], with |a + /3| < 2d, satisfy the conditional triangularity 
with respect to X , if when X is fixed and considered as a parameter, the 
resulting family denoted {pa^^jX} C M\Y] is in full triangular form with re- 
spect to the Y variables. More precisely, the following {0)d condition below 
holds: 

{0)d : <gi (a,/3),|(a,/3)| < d, and /?' > /?] ^ = 0. 

Next, for a fixed degree d > 1, we will have to consider the following zero 
in the inverse condition. 

Definition 3 [Zero in the inverse condition {V)d]- Assume the degree 
d is fixed. Let {a',f3') <gi (a,/3) with |(a,/3)| <d be arbitrary. 

If I (7,0") I > d whenever (7,0") >gi (a, (a', /?') and a > max(/?, then 

^(o,/3),(a',/3') = 0- 

The main result of this paper asserts that both conditions {V)d and {0)d 
are in fact equivalent. 

Theorem 6.1 (Conditional triangularity). Let y = {ya)aeN" be a multi- 
sequence and let d be an integer such that the associated moment matrices 
^d{y) ore positive definite. 

Then the zero in the inverse condition (y)r, r < d, holds if and only if 
{0)d holds, that is, if and only if the orthonormal polynomials satisfy the 
conditional triangularity with respect to X. 

Proof. Clearly, from its definition, {0)d implies {0)r for all r <d. One 
direction is obvious: Let r < d be fixed, arbitrary. If condition {0)r holds, 
then a pair ((a',/3') <gi (a,/?)) subject to the assumptions in {V)r will leave 
not a single term in the sum giving -Z{o,/3),(a',/3'). 

Conversely, assume that {V)d holds. We will prove the vanishing state- 
ment {0)d by descending induction with respect to the graded lexicograph- 
ical order. To this aim, we label all indices (a,/3), |(a,/3)| = d in decreasing 
graded lexicographic order: 

(ao,/?o) >gi >gi (02,^2) ■■■■ 

In particular, d = \ao\ > |ai| > • • • and = |/?o| < < • • • • 

To initialize the induction, consider {a', (3') <gi {ao,(3o) = (qo,0), with 
13' > 0, that is, /?' / 0. Then 

= ^{ao,/3o),(a',/3') = P(qo,/3o), ("o,/9o)/'("o,/3o), ("',/?') ' 
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Since the leading coefficient P(ao,i3o),iao,i3o) the orthogonal polynomial is 
nonzero, we infer 



[{a',/3') <gi (ao,/3o),/3'>/3o] 



P(ao,/3o),K,/3') - 0' 



which is exactly condition {0)d applied to this particular choice of indices. 

Assume that {0)d holds for all {aj,(3j),j < k. Let (a',/3') <gi {ak,(3k) with 
P' > that is, I max(/3', /3fc)| > In view of (V^), 



k-l 



j=0 



Note that the induction hypothesis implies that, for every < j < /c — 1, at 
least one factor p(aj,f3j),{akA) P{aj,i3,),{a',f3') vanishes. Thus, 



whence Pia,A),(»',l3') = 0- 

Once we have exhausted by the above induction all indices of length d, we 
proceed similarly to those of length d — 1, using now the zero in the inverse 
property {V)(i-i, and so on. □ 

Remark that the ordering {X, Y) with the tuple of full triangular variables 
Y on the second entry is important. A low degree example will be considered 
in the last section. 

We call Theorem 6.1 the conditional triangularity theorem because when 
the variables X are fixed (and so can be considered as parameters), then 
the orthogonal polynomials now considered as elements of M\Y] are in full 
triangular form, rephrased as in triangular form conditional to X is fixed. 



6.2. The link with partial correlation. Let us specialize to the case d = 1. 
Assume that the underlying joint distribution on the random vector X = 
{Xi, . . . , Xn) is centered, that is, J Xidp = for all i = 1, . . . , n; then 
reads 



1 I 0' 
I R 



and M^^ 



1 I 
I R-^ 



where R is just the usual covariance matrix. Partitioning the random vector 
as {Y,Xi,Xj) with Y = {Xk)k^ij yields 



R 



var(y) cov{Y,Xi) cov{Y,Xj) 
cov (y, Xi ) var (Xj ) cov {Xi , Xj ] 
cov (Y, Xj ) cov {Xi , Xj ) var (Xj ) 
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where var and cov have obvious meanings. The partial covariance of Xi and 
Xj given Y, denoted cov{Xi, Xj\Y) in Wittaker ([7], page 135) satisfies: 

(6.1) cov{Xi, Xj\Y) :=cov(Xi,Xj) -cov(y,X,)var(y)~^cov(y,Xj). 

After scahng to have a unit diagonal, the partial correlation between 
Xi and Xj (partiahed on Y) is the negative of cov{Xi,Xj\Y), and as already 
mentioned, R~^{i,j) = if and only if X^ and Xj have zero partial correla- 
tion, that is, cov{Xi, Xj\Y) = 0. See, for example, Wittaker [7], Corollaries 
5.8.2 and 5.8.4. 

Corollary 6.2. Let d = I. Then R~^{i,j) = if and only if the or- 
thonormal polynomials of degree up to 2, associated with Mi, satisfy the 
conditional triangularity with respect to X = (Xfc)fc^jj-. 

Proof. To recast the problem in the framework of Section 6.1, let Y = 
{Xi,Xj) and rename X := {Xk)ky^i,j- In view of Definition 3 with d=l, we 
only need consider pairs (a',/?') <gi {a,f3) with a = a' = and f3' = (0, 1), 
(3 = (1,0). But then a ^ max[/3,/3'] = (1, 1) implies |(7,(t)| > 2 > d, and so as 
i?-i(i,j) =0, the zero in the inverse condition {V)ci holds. Equivalently, by 
Theorem 6.1, {0)d holds. □ 

Corollary 6.2 states that the pair {Xi,Xj) has zero partial correlation if 
and only if the orthonormal polynomials up to degree 2 satisfy the condi- 
tional triangularity with respect to X = {Xk)k^i^j. That is, partial correla- 
tion and conditional triangularity are equivalent. 

Example 2. Let d = 1, and consider the case of three random vari- 
ables {X,Y,Z) with (centered) joint distribution ji. Then suppose that the 
orthonormal polynomials up to degree d=l satisfy the conditional triangu- 
larity property (y)i w.r.t. X. That is, pooo = 1 and 

Pioo = ai+ PiX, 

Pow = a2 + f32X + 72 y, 

Pool =03 + /33X + 73Z, 

for some coefficients (aiPiji). Notice that because of (0)i, we cannot have 
a linear term in Y in pooi- Orthogonality yields that {X'^ ,pa) = for all 
7 <gi a, that is, with E being the expectation w.r.t. /U, 

/?2E(x2)+72E(Xy) = 0, 
/33E(X2)+73E(XZ) = 0, 
/33E(Xy)+73E(yZ) = 0. 
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Stating that the determinant of the last two hnear equations in (/33,73) is 
zero yields 

E{YZ) - E{X, Y)E{X^y'^E{X, Z) = 0, 

which is just (6.1), that is, the zero partial correlation condition, up to a 
multiplicative constant. 

This immediately raises two questions: 

(i) What are the distributions for which the orthonormal polynomials up 
to degree 2 satisfy the conditional triangularity with respect to a given pair 

(ii) Among such distributions, what are those for which conditional in- 
dependence with respect to X = (X^^ij) also holds? 

An answer to the latter would characterize distributions for which zero 
partial correlation imply conditional independence (like for the normal dis- 
tribution). 

6.3. Conditional independence and zeros in the inverse. We have already 
mentioned that, in general, conditional independence is not detected from 
zero entries in the inverse of (equivalently, M^^), except for the nor- 
mal joint distribution, a common assumption in Graphical Gaussian Models. 
Therefore, a natural question of potential interest is to search for conditions 
on when conditional independence in the non-Gaussian case is related to the 
zero in the inverse property {V)ct, or equivalently, the conditional triangu- 
larity {0)d, not only for d=l, but also for d> 1. 

A rather negative result in this direction is as follows. Let d be fixed, arbi- 
trary, and let = {yijk) be the moment matrix of an arbitrary joint dis- 
tribution fi of three random variables {X, 11,12) on M. As we are considering 
only finitely many moments (up to order 2d), by Tchakaloff's theorem, there 
exists a measure (p finitely supported on, say s, points {x^^\y^\y2^) C 
(with associated probabilities {pi}), I = 1, . . . , s, and whose moments up to 
order 2d match those of /u; see, for example, Reznick [5], Theorem 7.18. 

Let us define a sequence {v?*} of probability measures as follows. Perturb 
each point {x^^\y[^\y2^) to (x^'^ + e{t,l),y[^\y2^), / = 1, . . . ,s, in such a way 
that no two points x^'^ + e{t,l) are the same, and keep the same weights 
{pi}- It is clear that ft satisfies 

1 = Prob[y = {y?J2^)\X = x« + e(t, I)] 
= Prob[yi = |X = x^^^ + e(t, 0]Prob[y2 = 2/? |^ = x^^^ + e(t, /)] 
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for all I = 1, . . . , s. That is, conditional to X, the variables Yi and Y2 are 
independent. Take a sequence with e(t, /) ^ for all ^, as t — > 00, and consider 
the moment matrix M^*^ associated with ipt . Clearly, as t ^ 00, 

J X'YlY^ dipt ^ J X'yIy^ dfi = Viju yij,k:i + j + k< 2d, 

that is, M^*^ Md. 

Therefore, if the zero in the inverse property {V)d does not hold for M^, 
then by a simple continuity argument, it cannot hold for any M^^ with 
sufficiently large t, and still the conditional independence property holds for 
each (pf One has just shown, that for every fixed d, one may easily construct 
examples of measures with the conditional independence property, which 
violate the zero in the inverse property {V)d- 
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